Overview

Dataset statistics

Number of variables22
Number of observations389003
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory76.3 MiB
Average record size in memory205.7 B

Variable types

DateTime2
Numeric9
Categorical11

Alerts

merchant has a high cardinality: 693 distinct valuesHigh cardinality
first has a high cardinality: 352 distinct valuesHigh cardinality
last has a high cardinality: 481 distinct valuesHigh cardinality
street has a high cardinality: 982 distinct valuesHigh cardinality
city has a high cardinality: 894 distinct valuesHigh cardinality
job has a high cardinality: 494 distinct valuesHigh cardinality
trans_num has a high cardinality: 389003 distinct valuesHigh cardinality
cc_num is highly overall correlated with gender and 1 other fieldsHigh correlation
zip is highly overall correlated with state and 4 other fieldsHigh correlation
lat is highly overall correlated with state and 4 other fieldsHigh correlation
long is highly overall correlated with state and 4 other fieldsHigh correlation
merch_lat is highly overall correlated with state and 4 other fieldsHigh correlation
merch_long is highly overall correlated with state and 4 other fieldsHigh correlation
gender is highly overall correlated with cc_num and 1 other fieldsHigh correlation
state is highly overall correlated with zip and 5 other fieldsHigh correlation
city_pop is highly overall correlated with stateHigh correlation
is_fraud is highly imbalanced (94.9%)Imbalance
amt is highly skewed (γ1 = 55.16028174)Skewed
trans_num is uniformly distributedUniform
trans_num has unique valuesUnique

Reproduction

Analysis started2023-05-17 00:06:20.706651
Analysis finished2023-05-17 00:08:22.813799
Duration2 minutes and 2.11 seconds
Software versionydata-profiling vv4.1.2
Download configurationconfig.json

Variables

Distinct386997
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
Minimum2019-01-01 00:00:51
Maximum2020-06-21 12:13:36
2023-05-17T00:08:22.952533image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:23.160782image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

cc_num
Real number (ℝ)

Distinct982
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.185546 × 1017
Minimum6.0416207 × 1010
Maximum4.9923464 × 1018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:23.362611image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum6.0416207 × 1010
5-th percentile6.3048488 × 1011
Q11.8004295 × 1014
median3.5214173 × 1015
Q34.6422555 × 1015
95-th percentile4.5025395 × 1018
Maximum4.9923464 × 1018
Range4.9923463 × 1018
Interquartile range (IQR)4.4622125 × 1015

Descriptive statistics

Standard deviation1.3111107 × 1018
Coefficient of variation (CV)3.1324724
Kurtosis6.1495469
Mean4.185546 × 1017
Median Absolute Deviation (MAD)3.0764709 × 1015
Skewness2.8465774
Sum8.0315113 × 1018
Variance1.7190114 × 1036
MonotonicityNot monotonic
2023-05-17T00:08:23.596924image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.512828415 × 10181009
 
0.3%
3.764452668 × 1014976
 
0.3%
6.304249875 × 1011968
 
0.2%
3.575789282 × 1015958
 
0.2%
6.534628261 × 1015956
 
0.2%
2.720433096 × 1015946
 
0.2%
6.011438889 × 1015945
 
0.2%
4.792627764 × 1018945
 
0.2%
3.54510934 × 1015944
 
0.2%
4.716561797 × 1015942
 
0.2%
Other values (972) 379414
97.5%
ValueCountFrequency (%)
6.041620718 × 1010436
0.1%
6.042292873 × 1010439
0.1%
6.042309813 × 1010162
 
< 0.1%
6.042785159 × 1010155
 
< 0.1%
6.048700208 × 1010151
 
< 0.1%
6.04905963 × 1010294
0.1%
6.049559311 × 1010152
 
< 0.1%
5.018029536 × 1011509
0.1%
5.018181333 × 10112
 
< 0.1%
5.018282048 × 1011145
 
< 0.1%
ValueCountFrequency (%)
4.992346398 × 1018654
0.2%
4.989847571 × 1018295
 
0.1%
4.980323468 × 1018151
 
< 0.1%
4.973530368 × 1018309
 
0.1%
4.958589672 × 1018458
0.1%
4.95682899 × 1018814
0.2%
4.911818931 × 10184
 
< 0.1%
4.906628656 × 1018780
0.2%
4.897067971 × 1018323
 
0.1%
4.890424427 × 1018459
0.1%

merchant
Categorical

Distinct693
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
fraud_Kilback LLC
 
1268
fraud_Cormier LLC
 
1093
fraud_Schumm PLC
 
1081
fraud_Dickinson Ltd
 
1061
fraud_Kuhn LLC
 
1034
Other values (688)
383466 

Length

Max length43
Median length36
Mean length23.133667
Min length13

Characters and Unicode

Total characters8999066
Distinct characters55
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfraud_Hills-Witting
2nd rowfraud_Kling Inc
3rd rowfraud_DuBuque LLC
4th rowfraud_Grimes LLC
5th rowfraud_Ullrich Ltd

Common Values

ValueCountFrequency (%)
fraud_Kilback LLC 1268
 
0.3%
fraud_Cormier LLC 1093
 
0.3%
fraud_Schumm PLC 1081
 
0.3%
fraud_Dickinson Ltd 1061
 
0.3%
fraud_Kuhn LLC 1034
 
0.3%
fraud_Boyer PLC 999
 
0.3%
fraud_Prohaska-Murray 838
 
0.2%
fraud_Olson, Becker and Koch 834
 
0.2%
fraud_Stroman, Hudson and Erdman 834
 
0.2%
fraud_Emard Inc 831
 
0.2%
Other values (683) 379130
97.5%

Length

2023-05-17T00:08:23.819232image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 142354
 
15.7%
llc 29295
 
3.2%
inc 27595
 
3.0%
sons 22046
 
2.4%
ltd 21298
 
2.3%
plc 19732
 
2.2%
group 15283
 
1.7%
fraud_kutch 3191
 
0.4%
fraud_schaefer 2855
 
0.3%
fraud_streich 2761
 
0.3%
Other values (804) 620812
68.4%

Most occurring characters

ValueCountFrequency (%)
a 872715
 
9.7%
r 808746
 
9.0%
d 641831
 
7.1%
e 559186
 
6.2%
u 557157
 
6.2%
n 530288
 
5.9%
518219
 
5.8%
f 419227
 
4.7%
_ 389003
 
4.3%
o 339376
 
3.8%
Other values (45) 3363318
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6810376
75.7%
Uppercase Letter 1019016
 
11.3%
Space Separator 518219
 
5.8%
Connector Punctuation 389003
 
4.3%
Dash Punctuation 133446
 
1.5%
Other Punctuation 129006
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 872715
12.8%
r 808746
11.9%
d 641831
9.4%
e 559186
 
8.2%
u 557157
 
8.2%
n 530288
 
7.8%
f 419227
 
6.2%
o 339376
 
5.0%
i 324292
 
4.8%
t 262990
 
3.9%
Other values (15) 1494568
21.9%
Uppercase Letter
ValueCountFrequency (%)
L 142762
14.0%
C 93391
 
9.2%
S 90718
 
8.9%
B 83325
 
8.2%
H 78015
 
7.7%
K 65247
 
6.4%
G 57896
 
5.7%
R 54548
 
5.4%
M 53836
 
5.3%
P 47587
 
4.7%
Other values (15) 251691
24.7%
Other Punctuation
ValueCountFrequency (%)
, 120308
93.3%
' 8698
 
6.7%
Space Separator
ValueCountFrequency (%)
518219
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 389003
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 133446
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7829392
87.0%
Common 1169674
 
13.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 872715
 
11.1%
r 808746
 
10.3%
d 641831
 
8.2%
e 559186
 
7.1%
u 557157
 
7.1%
n 530288
 
6.8%
f 419227
 
5.4%
o 339376
 
4.3%
i 324292
 
4.1%
t 262990
 
3.4%
Other values (40) 2513584
32.1%
Common
ValueCountFrequency (%)
518219
44.3%
_ 389003
33.3%
- 133446
 
11.4%
, 120308
 
10.3%
' 8698
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8999066
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 872715
 
9.7%
r 808746
 
9.0%
d 641831
 
7.1%
e 559186
 
6.2%
u 557157
 
6.2%
n 530288
 
5.9%
518219
 
5.8%
f 419227
 
4.7%
_ 389003
 
4.3%
o 339376
 
3.8%
Other values (45) 3363318
37.4%

category
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
gas_transport
39239 
grocery_pos
37261 
home
37029 
shopping_pos
34789 
kids_pets
33961 
Other values (9)
206724 

Length

Max length14
Median length12
Mean length10.523646
Min length4

Characters and Unicode

Total characters4093730
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowshopping_net
2nd rowgas_transport
3rd rowgrocery_pos
4th rowentertainment
5th rowkids_pets

Common Values

ValueCountFrequency (%)
gas_transport 39239
10.1%
grocery_pos 37261
9.6%
home 37029
9.5%
shopping_pos 34789
8.9%
kids_pets 33961
8.7%
shopping_net 29244
7.5%
entertainment 28341
7.3%
food_dining 27464
7.1%
personal_care 27394
 
7.0%
health_fitness 25748
 
6.6%
Other values (4) 68533
17.6%

Length

2023-05-17T00:08:24.006048image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gas_transport 39239
10.1%
grocery_pos 37261
9.6%
home 37029
9.5%
shopping_pos 34789
8.9%
kids_pets 33961
8.7%
shopping_net 29244
7.5%
entertainment 28341
7.3%
food_dining 27464
7.1%
personal_care 27394
 
7.0%
health_fitness 25748
 
6.6%
Other values (4) 68533
17.6%

Most occurring characters

ValueCountFrequency (%)
s 427903
10.5%
e 387087
9.5%
o 369314
9.0%
n 358077
8.7%
p 324524
 
7.9%
t 322921
 
7.9%
_ 311382
 
7.6%
r 275512
 
6.7%
i 249727
 
6.1%
a 199606
 
4.9%
Other values (10) 867677
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3782348
92.4%
Connector Punctuation 311382
 
7.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 427903
11.3%
e 387087
10.2%
o 369314
9.8%
n 358077
9.5%
p 324524
8.6%
t 322921
8.5%
r 275512
7.3%
i 249727
 
6.6%
a 199606
 
5.3%
g 181563
 
4.8%
Other values (9) 686114
18.1%
Connector Punctuation
ValueCountFrequency (%)
_ 311382
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3782348
92.4%
Common 311382
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 427903
11.3%
e 387087
10.2%
o 369314
9.8%
n 358077
9.5%
p 324524
8.6%
t 322921
8.5%
r 275512
7.3%
i 249727
 
6.6%
a 199606
 
5.3%
g 181563
 
4.8%
Other values (9) 686114
18.1%
Common
ValueCountFrequency (%)
_ 311382
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4093730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 427903
10.5%
e 387087
9.5%
o 369314
9.0%
n 358077
8.7%
p 324524
 
7.9%
t 322921
 
7.9%
_ 311382
 
7.6%
r 275512
 
6.7%
i 249727
 
6.1%
a 199606
 
4.9%
Other values (10) 867677
21.2%

amt
Real number (ℝ)

Distinct33399
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.249035
Minimum1
Maximum28948.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:24.196767image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.45
Q19.64
median47.5
Q383.06
95-th percentile195.689
Maximum28948.9
Range28947.9
Interquartile range (IQR)73.42

Descriptive statistics

Standard deviation169.32887
Coefficient of variation (CV)2.4104086
Kurtosis6777.7869
Mean70.249035
Median Absolute Deviation (MAD)37.5
Skewness55.160282
Sum27327085
Variance28672.268
MonotonicityNot monotonic
2023-05-17T00:08:24.389402image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.1 180
 
< 0.1%
1.14 179
 
< 0.1%
1.31 167
 
< 0.1%
1.12 163
 
< 0.1%
1.03 161
 
< 0.1%
1.08 161
 
< 0.1%
1.25 158
 
< 0.1%
1.4 158
 
< 0.1%
1.17 157
 
< 0.1%
1.16 156
 
< 0.1%
Other values (33389) 387363
99.6%
ValueCountFrequency (%)
1 59
 
< 0.1%
1.01 154
< 0.1%
1.02 142
< 0.1%
1.03 161
< 0.1%
1.04 140
< 0.1%
1.05 155
< 0.1%
1.06 134
< 0.1%
1.07 145
< 0.1%
1.08 161
< 0.1%
1.09 138
< 0.1%
ValueCountFrequency (%)
28948.9 1
< 0.1%
27119.77 1
< 0.1%
26544.12 1
< 0.1%
17897.24 1
< 0.1%
15305.95 1
< 0.1%
14849.74 1
< 0.1%
14238.11 1
< 0.1%
12176.55 1
< 0.1%
12025.3 1
< 0.1%
11872.21 1
< 0.1%

first
Categorical

Distinct352
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
Christopher
 
7987
Robert
 
6604
James
 
6151
Jessica
 
6108
Michael
 
6013
Other values (347)
356140 

Length

Max length11
Median length9
Mean length6.0823515
Min length3

Characters and Unicode

Total characters2366053
Distinct characters49
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowAmber
2nd rowRobert
3rd rowKathryn
4th rowPhillip
5th rowVanessa

Common Values

ValueCountFrequency (%)
Christopher 7987
 
2.1%
Robert 6604
 
1.7%
James 6151
 
1.6%
Jessica 6108
 
1.6%
Michael 6013
 
1.5%
David 5965
 
1.5%
Jennifer 5168
 
1.3%
William 4986
 
1.3%
John 4965
 
1.3%
Mary 4825
 
1.2%
Other values (342) 330231
84.9%

Length

2023-05-17T00:08:24.605707image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
christopher 7987
 
2.1%
robert 6604
 
1.7%
james 6151
 
1.6%
jessica 6108
 
1.6%
michael 6013
 
1.5%
david 5965
 
1.5%
jennifer 5168
 
1.3%
william 4986
 
1.3%
john 4965
 
1.3%
mary 4825
 
1.2%
Other values (342) 330231
84.9%

Most occurring characters

ValueCountFrequency (%)
a 301673
 
12.8%
e 258513
 
10.9%
i 185377
 
7.8%
n 183962
 
7.8%
r 182008
 
7.7%
l 116715
 
4.9%
h 103744
 
4.4%
s 97785
 
4.1%
t 93652
 
4.0%
o 80985
 
3.4%
Other values (39) 761639
32.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1977050
83.6%
Uppercase Letter 389003
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 301673
15.3%
e 258513
13.1%
i 185377
9.4%
n 183962
9.3%
r 182008
9.2%
l 116715
 
5.9%
h 103744
 
5.2%
s 97785
 
4.9%
t 93652
 
4.7%
o 80985
 
4.1%
Other values (16) 372636
18.8%
Uppercase Letter
ValueCountFrequency (%)
J 65790
16.9%
M 43564
11.2%
S 34243
8.8%
A 33507
8.6%
C 31999
8.2%
K 25703
 
6.6%
D 25634
 
6.6%
R 21292
 
5.5%
T 19823
 
5.1%
L 18831
 
4.8%
Other values (13) 68617
17.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 2366053
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 301673
 
12.8%
e 258513
 
10.9%
i 185377
 
7.8%
n 183962
 
7.8%
r 182008
 
7.7%
l 116715
 
4.9%
h 103744
 
4.4%
s 97785
 
4.1%
t 93652
 
4.0%
o 80985
 
3.4%
Other values (39) 761639
32.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2366053
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 301673
 
12.8%
e 258513
 
10.9%
i 185377
 
7.8%
n 183962
 
7.8%
r 182008
 
7.7%
l 116715
 
4.9%
h 103744
 
4.4%
s 97785
 
4.1%
t 93652
 
4.0%
o 80985
 
3.4%
Other values (39) 761639
32.2%

last
Categorical

Distinct481
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
Smith
 
8711
Williams
 
6981
Davis
 
6497
Johnson
 
5844
Rodriguez
 
5208
Other values (476)
355762 

Length

Max length11
Median length10
Mean length6.107228
Min length2

Characters and Unicode

Total characters2375730
Distinct characters48
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowLewis
2nd rowJames
3rd rowSmith
4th rowRobertson
5th rowAnderson

Common Values

ValueCountFrequency (%)
Smith 8711
 
2.2%
Williams 6981
 
1.8%
Davis 6497
 
1.7%
Johnson 5844
 
1.5%
Rodriguez 5208
 
1.3%
Martinez 4505
 
1.2%
Jones 4151
 
1.1%
Lewis 3749
 
1.0%
Gonzalez 3549
 
0.9%
Miller 3513
 
0.9%
Other values (471) 336295
86.5%

Length

2023-05-17T00:08:24.797558image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
smith 8711
 
2.2%
williams 6981
 
1.8%
davis 6497
 
1.7%
johnson 5844
 
1.5%
rodriguez 5208
 
1.3%
martinez 4505
 
1.2%
jones 4151
 
1.1%
lewis 3749
 
1.0%
gonzalez 3549
 
0.9%
miller 3513
 
0.9%
Other values (471) 336295
86.5%

Most occurring characters

ValueCountFrequency (%)
e 236267
 
9.9%
r 197779
 
8.3%
a 194214
 
8.2%
n 182474
 
7.7%
o 174868
 
7.4%
l 146587
 
6.2%
s 145591
 
6.1%
i 130221
 
5.5%
t 86481
 
3.6%
h 68251
 
2.9%
Other values (38) 812997
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1986727
83.6%
Uppercase Letter 389003
 
16.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 236267
11.9%
r 197779
10.0%
a 194214
9.8%
n 182474
9.2%
o 174868
8.8%
l 146587
 
7.4%
s 145591
 
7.3%
i 130221
 
6.6%
t 86481
 
4.4%
h 68251
 
3.4%
Other values (15) 423994
21.3%
Uppercase Letter
ValueCountFrequency (%)
M 47471
12.2%
W 32079
 
8.2%
S 31511
 
8.1%
C 27961
 
7.2%
B 25297
 
6.5%
R 25089
 
6.4%
H 24370
 
6.3%
G 22694
 
5.8%
J 21160
 
5.4%
P 19706
 
5.1%
Other values (13) 111665
28.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 2375730
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 236267
 
9.9%
r 197779
 
8.3%
a 194214
 
8.2%
n 182474
 
7.7%
o 174868
 
7.4%
l 146587
 
6.2%
s 145591
 
6.1%
i 130221
 
5.5%
t 86481
 
3.6%
h 68251
 
2.9%
Other values (38) 812997
34.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2375730
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 236267
 
9.9%
r 197779
 
8.3%
a 194214
 
8.2%
n 182474
 
7.7%
o 174868
 
7.4%
l 146587
 
6.2%
s 145591
 
6.1%
i 130221
 
5.5%
t 86481
 
3.6%
h 68251
 
2.9%
Other values (38) 812997
34.2%

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
F
212679 
M
176324 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters389003
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowF
4th rowM
5th rowF

Common Values

ValueCountFrequency (%)
F 212679
54.7%
M 176324
45.3%

Length

2023-05-17T00:08:24.956572image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-17T00:08:25.125774image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
f 212679
54.7%
m 176324
45.3%

Most occurring characters

ValueCountFrequency (%)
F 212679
54.7%
M 176324
45.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 389003
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F 212679
54.7%
M 176324
45.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 389003
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F 212679
54.7%
M 176324
45.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 389003
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 212679
54.7%
M 176324
45.3%

street
Categorical

Distinct982
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
864 Reynolds Plains
 
1009
372 Jeffrey Course
 
976
2870 Bean Terrace Apt. 756
 
968
7618 Gonzales Mission
 
958
29606 Martinez Views Suite 653
 
956
Other values (977)
384136 

Length

Max length35
Median length29
Mean length22.225983
Min length12

Characters and Unicode

Total characters8645974
Distinct characters62
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)< 0.1%

Sample

1st row6296 John Keys Suite 858
2nd row18316 Cannon Place
3rd row19838 Tonya Prairie Apt. 947
4th row85344 Smith Gateway Apt. 280
5th row21178 Brittney Locks

Common Values

ValueCountFrequency (%)
864 Reynolds Plains 1009
 
0.3%
372 Jeffrey Course 976
 
0.3%
2870 Bean Terrace Apt. 756 968
 
0.2%
7618 Gonzales Mission 958
 
0.2%
29606 Martinez Views Suite 653 956
 
0.2%
854 Walker Dale Suite 488 946
 
0.2%
40624 Rebecca Spurs 945
 
0.2%
7952 Karen Pike 945
 
0.2%
8030 Beck Motorway 944
 
0.2%
11014 Chad Lake Apt. 573 942
 
0.2%
Other values (972) 379414
97.5%

Length

2023-05-17T00:08:25.294986image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
apt 98165
 
6.3%
suite 91491
 
5.9%
island 6981
 
0.5%
michael 5625
 
0.4%
station 5376
 
0.3%
common 5349
 
0.3%
islands 5319
 
0.3%
david 5215
 
0.3%
brooks 5105
 
0.3%
fields 4966
 
0.3%
Other values (1938) 1312729
84.9%

Most occurring characters

ValueCountFrequency (%)
1157318
 
13.4%
e 537671
 
6.2%
a 436559
 
5.0%
i 389799
 
4.5%
t 374730
 
4.3%
r 331150
 
3.8%
n 320223
 
3.7%
s 310614
 
3.6%
l 266766
 
3.1%
o 262407
 
3.0%
Other values (52) 4258737
49.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 4325186
50.0%
Decimal Number 2097643
24.3%
Space Separator 1157318
 
13.4%
Uppercase Letter 967662
 
11.2%
Other Punctuation 98165
 
1.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 537671
12.4%
a 436559
10.1%
i 389799
9.0%
t 374730
8.7%
r 331150
 
7.7%
n 320223
 
7.4%
s 310614
 
7.2%
l 266766
 
6.2%
o 262407
 
6.1%
u 184498
 
4.3%
Other values (16) 910769
21.1%
Uppercase Letter
ValueCountFrequency (%)
S 168431
17.4%
A 126251
13.0%
M 77421
 
8.0%
C 67442
 
7.0%
P 59081
 
6.1%
R 55929
 
5.8%
B 44661
 
4.6%
F 42866
 
4.4%
L 39577
 
4.1%
J 36221
 
3.7%
Other values (14) 249782
25.8%
Decimal Number
ValueCountFrequency (%)
5 224471
10.7%
3 221250
10.5%
2 220401
10.5%
7 211589
10.1%
1 208271
9.9%
8 207692
9.9%
6 203782
9.7%
0 202913
9.7%
4 200145
9.5%
9 197129
9.4%
Space Separator
ValueCountFrequency (%)
1157318
100.0%
Other Punctuation
ValueCountFrequency (%)
. 98165
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5292848
61.2%
Common 3353126
38.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 537671
 
10.2%
a 436559
 
8.2%
i 389799
 
7.4%
t 374730
 
7.1%
r 331150
 
6.3%
n 320223
 
6.1%
s 310614
 
5.9%
l 266766
 
5.0%
o 262407
 
5.0%
u 184498
 
3.5%
Other values (40) 1878431
35.5%
Common
ValueCountFrequency (%)
1157318
34.5%
5 224471
 
6.7%
3 221250
 
6.6%
2 220401
 
6.6%
7 211589
 
6.3%
1 208271
 
6.2%
8 207692
 
6.2%
6 203782
 
6.1%
0 202913
 
6.1%
4 200145
 
6.0%
Other values (2) 295294
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8645974
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1157318
 
13.4%
e 537671
 
6.2%
a 436559
 
5.0%
i 389799
 
4.5%
t 374730
 
4.3%
r 331150
 
3.8%
n 320223
 
3.7%
s 310614
 
3.6%
l 266766
 
3.1%
o 262407
 
3.0%
Other values (52) 4258737
49.3%

city
Categorical

Distinct894
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
Birmingham
 
1649
Phoenix
 
1539
Meridian
 
1519
Utica
 
1502
San Antonio
 
1464
Other values (889)
381330 

Length

Max length25
Median length21
Mean length8.6494706
Min length3

Characters and Unicode

Total characters3364670
Distinct characters52
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)< 0.1%

Sample

1st rowPembroke Township
2nd rowNewport
3rd rowRocky Mount
4th rowHarrodsburg
5th rowProsperity

Common Values

ValueCountFrequency (%)
Birmingham 1649
 
0.4%
Phoenix 1539
 
0.4%
Meridian 1519
 
0.4%
Utica 1502
 
0.4%
San Antonio 1464
 
0.4%
Warren 1419
 
0.4%
Thomas 1393
 
0.4%
Conway 1355
 
0.3%
Cleveland 1338
 
0.3%
Burbank 1293
 
0.3%
Other values (884) 374532
96.3%

Length

2023-05-17T00:08:25.493277image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city 6387
 
1.3%
west 5921
 
1.2%
north 4349
 
0.9%
saint 4344
 
0.9%
falls 3886
 
0.8%
new 3627
 
0.7%
lake 3392
 
0.7%
mount 3375
 
0.7%
san 2989
 
0.6%
springs 2604
 
0.5%
Other values (918) 444563
91.6%

Most occurring characters

ValueCountFrequency (%)
e 327388
 
9.7%
a 280684
 
8.3%
n 246472
 
7.3%
o 245079
 
7.3%
l 233863
 
7.0%
r 224644
 
6.7%
i 210699
 
6.3%
t 180033
 
5.4%
s 134255
 
4.0%
96434
 
2.9%
Other values (42) 1185119
35.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2782211
82.7%
Uppercase Letter 485731
 
14.4%
Space Separator 96434
 
2.9%
Dash Punctuation 294
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 327388
11.8%
a 280684
10.1%
n 246472
8.9%
o 245079
8.8%
l 233863
 
8.4%
r 224644
 
8.1%
i 210699
 
7.6%
t 180033
 
6.5%
s 134255
 
4.8%
d 92355
 
3.3%
Other values (15) 606739
21.8%
Uppercase Letter
ValueCountFrequency (%)
C 47053
 
9.7%
M 43980
 
9.1%
S 40999
 
8.4%
B 39792
 
8.2%
H 34909
 
7.2%
W 28700
 
5.9%
P 27613
 
5.7%
L 25983
 
5.3%
R 23599
 
4.9%
A 22440
 
4.6%
Other values (15) 150663
31.0%
Space Separator
ValueCountFrequency (%)
96434
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 294
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3267942
97.1%
Common 96728
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 327388
 
10.0%
a 280684
 
8.6%
n 246472
 
7.5%
o 245079
 
7.5%
l 233863
 
7.2%
r 224644
 
6.9%
i 210699
 
6.4%
t 180033
 
5.5%
s 134255
 
4.1%
d 92355
 
2.8%
Other values (40) 1092470
33.4%
Common
ValueCountFrequency (%)
96434
99.7%
- 294
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3364670
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 327388
 
9.7%
a 280684
 
8.3%
n 246472
 
7.3%
o 245079
 
7.3%
l 233863
 
7.0%
r 224644
 
6.7%
i 210699
 
6.3%
t 180033
 
5.4%
s 134255
 
4.0%
96434
 
2.9%
Other values (42) 1185119
35.2%

state
Categorical

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
TX
28360 
NY
 
25049
PA
 
23990
CA
 
16970
OH
 
13915
Other values (45)
280719 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters778006
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIL
2nd rowME
3rd rowMO
4th rowIN
5th rowSC

Common Values

ValueCountFrequency (%)
TX 28360
 
7.3%
NY 25049
 
6.4%
PA 23990
 
6.2%
CA 16970
 
4.4%
OH 13915
 
3.6%
MI 13860
 
3.6%
IL 12972
 
3.3%
FL 12878
 
3.3%
AL 12223
 
3.1%
MO 11346
 
2.9%
Other values (40) 217440
55.9%

Length

2023-05-17T00:08:25.682616image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tx 28360
 
7.3%
ny 25049
 
6.4%
pa 23990
 
6.2%
ca 16970
 
4.4%
oh 13915
 
3.6%
mi 13860
 
3.6%
il 12972
 
3.3%
fl 12878
 
3.3%
al 12223
 
3.1%
mo 11346
 
2.9%
Other values (40) 217440
55.9%

Most occurring characters

ValueCountFrequency (%)
A 106791
13.7%
N 85557
 
11.0%
M 66284
 
8.5%
I 54610
 
7.0%
T 46005
 
5.9%
L 44397
 
5.7%
O 42889
 
5.5%
C 42359
 
5.4%
Y 39311
 
5.1%
X 28360
 
3.6%
Other values (14) 221443
28.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 778006
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 106791
13.7%
N 85557
 
11.0%
M 66284
 
8.5%
I 54610
 
7.0%
T 46005
 
5.9%
L 44397
 
5.7%
O 42889
 
5.5%
C 42359
 
5.4%
Y 39311
 
5.1%
X 28360
 
3.6%
Other values (14) 221443
28.5%

Most occurring scripts

ValueCountFrequency (%)
Latin 778006
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 106791
13.7%
N 85557
 
11.0%
M 66284
 
8.5%
I 54610
 
7.0%
T 46005
 
5.9%
L 44397
 
5.7%
O 42889
 
5.5%
C 42359
 
5.4%
Y 39311
 
5.1%
X 28360
 
3.6%
Other values (14) 221443
28.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 778006
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 106791
13.7%
N 85557
 
11.0%
M 66284
 
8.5%
I 54610
 
7.0%
T 46005
 
5.9%
L 44397
 
5.7%
O 42889
 
5.5%
C 42359
 
5.4%
Y 39311
 
5.1%
X 28360
 
3.6%
Other values (14) 221443
28.5%

zip
Real number (ℝ)

Distinct969
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48763.684
Minimum1257
Maximum99783
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:25.881258image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1257
5-th percentile7208
Q126237
median48174
Q372011
95-th percentile94569
Maximum99783
Range98526
Interquartile range (IQR)45774

Descriptive statistics

Standard deviation26892.358
Coefficient of variation (CV)0.55148331
Kurtosis-1.0945937
Mean48763.684
Median Absolute Deviation (MAD)23058
Skewness0.079981952
Sum1.8969219 × 1010
Variance7.231989 × 108
MonotonicityNot monotonic
2023-05-17T00:08:26.101333image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
48088 1116
 
0.3%
73754 1111
 
0.3%
34112 1069
 
0.3%
82514 1027
 
0.3%
15484 1009
 
0.3%
69165 976
 
0.3%
26292 968
 
0.2%
64019 958
 
0.2%
5461 956
 
0.2%
4287 946
 
0.2%
Other values (959) 378867
97.4%
ValueCountFrequency (%)
1257 587
0.2%
1330 305
 
0.1%
1535 161
 
< 0.1%
1545 295
 
0.1%
1612 179
 
< 0.1%
1843 771
0.2%
1844 644
0.2%
2180 168
 
< 0.1%
2630 669
0.2%
2908 159
 
< 0.1%
ValueCountFrequency (%)
99783 469
0.1%
99747 7
 
< 0.1%
99746 171
 
< 0.1%
99323 785
0.2%
99160 920
0.2%
99116 3
 
< 0.1%
99113 331
 
0.1%
99033 731
0.2%
98836 162
 
< 0.1%
98665 150
 
< 0.1%

lat
Real number (ℝ)

Distinct967
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.54639
Minimum20.0271
Maximum66.6933
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:26.311102image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum20.0271
5-th percentile29.8826
Q134.6205
median39.3716
Q341.9404
95-th percentile45.8433
Maximum66.6933
Range46.6662
Interquartile range (IQR)7.3199

Descriptive statistics

Standard deviation5.0849144
Coefficient of variation (CV)0.13191675
Kurtosis0.82936072
Mean38.54639
Median Absolute Deviation (MAD)3.3564
Skewness-0.18657805
Sum14994661
Variance25.856355
MonotonicityNot monotonic
2023-05-17T00:08:26.515550image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42.5164 1116
 
0.3%
36.385 1111
 
0.3%
26.1184 1069
 
0.3%
43.0048 1027
 
0.3%
39.8936 1009
 
0.3%
41.1558 976
 
0.3%
39.1505 968
 
0.2%
38.7897 958
 
0.2%
44.3346 956
 
0.2%
44.0575 946
 
0.2%
Other values (957) 378867
97.4%
ValueCountFrequency (%)
20.0271 476
0.1%
20.0827 309
 
0.1%
24.6557 780
0.2%
26.1184 1069
0.3%
26.3304 177
 
< 0.1%
26.3771 159
 
< 0.1%
26.4215 907
0.2%
26.4722 760
0.2%
26.529 457
0.1%
26.6939 313
 
0.1%
ValueCountFrequency (%)
66.6933 7
 
< 0.1%
65.6899 171
 
< 0.1%
64.7556 469
0.1%
48.8878 920
0.2%
48.8856 621
0.2%
48.8328 479
0.1%
48.6669 339
 
0.1%
48.6031 897
0.2%
48.4786 586
0.2%
48.34 907
0.2%

long
Real number (ℝ)

Distinct968
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.216346
Minimum-165.6723
Maximum-67.9503
Zeros0
Zeros (%)0.0%
Negative389003
Negative (%)100.0%
Memory size14.0 MiB
2023-05-17T00:08:26.736847image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-165.6723
5-th percentile-119.0825
Q1-96.798
median-87.4769
Q3-80.158
95-th percentile-73.5112
Maximum-67.9503
Range97.722
Interquartile range (IQR)16.64

Descriptive statistics

Standard deviation13.767459
Coefficient of variation (CV)-0.15260493
Kurtosis1.8805399
Mean-90.216346
Median Absolute Deviation (MAD)8.1527
Skewness-1.1544279
Sum-35094429
Variance189.54292
MonotonicityNot monotonic
2023-05-17T00:08:26.942577image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-82.9832 1116
 
0.3%
-98.0727 1111
 
0.3%
-81.7361 1069
 
0.3%
-108.8964 1027
 
0.3%
-79.7856 1009
 
0.3%
-101.136 976
 
0.3%
-79.503 968
 
0.2%
-93.8702 958
 
0.2%
-73.098 956
 
0.2%
-69.9656 946
 
0.2%
Other values (958) 378867
97.4%
ValueCountFrequency (%)
-165.6723 469
0.1%
-156.292 171
 
< 0.1%
-155.488 309
0.1%
-155.3697 476
0.1%
-153.994 7
 
< 0.1%
-124.4409 314
0.1%
-124.2174 487
0.1%
-124.1587 329
0.1%
-124.1437 456
0.1%
-123.9743 626
0.2%
ValueCountFrequency (%)
-67.9503 608
0.2%
-68.5565 293
 
0.1%
-69.2675 155
 
< 0.1%
-69.4828 647
0.2%
-69.9576 167
 
< 0.1%
-69.9656 946
0.2%
-70.1031 3
 
< 0.1%
-70.239 333
 
0.1%
-70.3001 669
0.2%
-70.3457 435
0.1%

city_pop
Real number (ℝ)

Distinct878
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88362.409
Minimum23
Maximum2906700
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:27.149809image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum23
5-th percentile139
Q1743
median2456
Q320328
95-th percentile518429
Maximum2906700
Range2906677
Interquartile range (IQR)19585

Descriptive statistics

Standard deviation300908.65
Coefficient of variation (CV)3.4053921
Kurtosis38.040459
Mean88362.409
Median Absolute Deviation (MAD)2198
Skewness5.6235064
Sum3.4373242 × 1010
Variance9.0546015 × 1010
MonotonicityNot monotonic
2023-05-17T00:08:27.357173image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
606 1598
 
0.4%
1312922 1539
 
0.4%
1595797 1464
 
0.4%
1766 1389
 
0.4%
241 1337
 
0.3%
302 1249
 
0.3%
2906700 1248
 
0.3%
276002 1246
 
0.3%
198 1208
 
0.3%
910148 1205
 
0.3%
Other values (868) 375520
96.5%
ValueCountFrequency (%)
23 680
0.2%
37 287
 
0.1%
43 599
0.2%
46 868
0.2%
47 150
 
< 0.1%
49 321
 
0.1%
51 318
 
0.1%
52 169
 
< 0.1%
53 778
0.2%
60 298
 
0.1%
ValueCountFrequency (%)
2906700 1248
0.3%
2504700 616
0.2%
2383912 149
 
< 0.1%
1595797 1464
0.4%
1577385 800
0.2%
1526206 1019
0.3%
1417793 2
 
< 0.1%
1382480 616
0.2%
1312922 1539
0.4%
1263321 1111
0.3%

job
Categorical

Distinct494
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
Film/video editor
 
2939
Exhibition designer
 
2691
Naval architect
 
2619
Surveyor, land/geomatics
 
2595
Designer, ceramics/pottery
 
2531
Other values (489)
375628 

Length

Max length59
Median length38
Mean length20.221502
Min length3

Characters and Unicode

Total characters7866225
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowPsychotherapist, child
2nd rowLexicographer
3rd rowTax inspector
4th rowSocial researcher
5th rowArchaeologist

Common Values

ValueCountFrequency (%)
Film/video editor 2939
 
0.8%
Exhibition designer 2691
 
0.7%
Naval architect 2619
 
0.7%
Surveyor, land/geomatics 2595
 
0.7%
Designer, ceramics/pottery 2531
 
0.7%
Materials engineer 2529
 
0.7%
Systems developer 2325
 
0.6%
IT trainer 2320
 
0.6%
Environmental consultant 2241
 
0.6%
Financial adviser 2229
 
0.6%
Other values (484) 363984
93.6%

Length

2023-05-17T00:08:27.590123image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
engineer 39829
 
4.6%
officer 33394
 
3.9%
manager 18282
 
2.1%
scientist 16644
 
1.9%
designer 15634
 
1.8%
surveyor 14722
 
1.7%
teacher 11391
 
1.3%
psychologist 9874
 
1.1%
research 8893
 
1.0%
editor 8681
 
1.0%
Other values (456) 685976
79.5%

Most occurring characters

ValueCountFrequency (%)
e 841424
 
10.7%
i 715448
 
9.1%
r 660585
 
8.4%
a 543559
 
6.9%
t 533350
 
6.8%
n 529137
 
6.7%
474317
 
6.0%
o 447733
 
5.7%
s 432444
 
5.5%
c 396711
 
5.0%
Other values (43) 2291517
29.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6833249
86.9%
Space Separator 474317
 
6.0%
Uppercase Letter 410738
 
5.2%
Other Punctuation 133629
 
1.7%
Close Punctuation 7146
 
0.1%
Open Punctuation 7146
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 841424
12.3%
i 715448
10.5%
r 660585
9.7%
a 543559
 
8.0%
t 533350
 
7.8%
n 529137
 
7.7%
o 447733
 
6.6%
s 432444
 
6.3%
c 396711
 
5.8%
l 299601
 
4.4%
Other values (16) 1433257
21.0%
Uppercase Letter
ValueCountFrequency (%)
C 46982
11.4%
E 43615
10.6%
P 42886
10.4%
S 41177
10.0%
T 34071
 
8.3%
M 26957
 
6.6%
A 26283
 
6.4%
F 20533
 
5.0%
D 17527
 
4.3%
R 16803
 
4.1%
Other values (11) 93904
22.9%
Other Punctuation
ValueCountFrequency (%)
, 93894
70.3%
/ 37331
 
27.9%
' 2404
 
1.8%
Space Separator
ValueCountFrequency (%)
474317
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7146
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7146
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7243987
92.1%
Common 622238
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 841424
11.6%
i 715448
 
9.9%
r 660585
 
9.1%
a 543559
 
7.5%
t 533350
 
7.4%
n 529137
 
7.3%
o 447733
 
6.2%
s 432444
 
6.0%
c 396711
 
5.5%
l 299601
 
4.1%
Other values (37) 1843995
25.5%
Common
ValueCountFrequency (%)
474317
76.2%
, 93894
 
15.1%
/ 37331
 
6.0%
) 7146
 
1.1%
( 7146
 
1.1%
' 2404
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7866225
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 841424
 
10.7%
i 715448
 
9.1%
r 660585
 
8.4%
a 543559
 
6.9%
t 533350
 
6.8%
n 529137
 
6.7%
474317
 
6.0%
o 447733
 
5.7%
s 432444
 
5.5%
c 396711
 
5.0%
Other values (43) 2291517
29.1%

dob
Date

Distinct967
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
Minimum1924-10-30 00:00:00
Maximum2005-01-29 00:00:00
2023-05-17T00:08:27.816622image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:28.379495image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

trans_num
Categorical

HIGH CARDINALITY  UNIFORM  UNIQUE 

Distinct389003
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
d0e8265dc7e7b979c1533abebe95402c
 
1
dacdb6a22d977b027a8fb074a0447076
 
1
0d906bb5ceb81dcd3296e81b389341d5
 
1
ba345b8475217cf2e0230e1fbfa9e4ce
 
1
c12970404cdc90e85349479ec5fa3909
 
1
Other values (388998)
388998 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters12448096
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique389003 ?
Unique (%)100.0%

Sample

1st rowd0e8265dc7e7b979c1533abebe95402c
2nd row71a947bf4d90e76e4d5b9a5f1b1ec8b4
3rd rowdfda32052a68f1452b1190b182672cb0
4th row8858dbc699716a100343e8402c3d2d17
5th rowb73f49b5c7081b864f1ba77678d86fce

Common Values

ValueCountFrequency (%)
d0e8265dc7e7b979c1533abebe95402c 1
 
< 0.1%
dacdb6a22d977b027a8fb074a0447076 1
 
< 0.1%
0d906bb5ceb81dcd3296e81b389341d5 1
 
< 0.1%
ba345b8475217cf2e0230e1fbfa9e4ce 1
 
< 0.1%
c12970404cdc90e85349479ec5fa3909 1
 
< 0.1%
0f8fef195889cbce74e53a0cf362e3a3 1
 
< 0.1%
6124557b02755ab42d8cd6aec1ab18b6 1
 
< 0.1%
39ba98c66e9ca69b98eef0b0fbc95abb 1
 
< 0.1%
8c50ebb14119b5d29d4db8ed3e300b28 1
 
< 0.1%
5eb79aa6a1ac2ca00d5f6c6f9241e5b6 1
 
< 0.1%
Other values (388993) 388993
> 99.9%

Length

2023-05-17T00:08:28.607989image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
d0e8265dc7e7b979c1533abebe95402c 1
 
< 0.1%
e77d938381d3c533c24e2159a6af263a 1
 
< 0.1%
f0dccb373dbb52f87993b587cdac5338 1
 
< 0.1%
4f7096fc7703ec9af052a743a2f6e2f4 1
 
< 0.1%
570a1cba31bb2b0f2421740a8b599f6b 1
 
< 0.1%
09e9b830ca58e5c71a23748a3d05f869 1
 
< 0.1%
12cfa522315fd07b4fcf44dd752203f4 1
 
< 0.1%
871b92a678ed78cef4ec01e64a60bcd8 1
 
< 0.1%
5aa5735c69932b6c476b9c8910490e2c 1
 
< 0.1%
75e691168933aafe9e0157ded10c971d 1
 
< 0.1%
Other values (388993) 388993
> 99.9%

Most occurring characters

ValueCountFrequency (%)
9 779268
 
6.3%
7 778763
 
6.3%
c 778747
 
6.3%
3 778533
 
6.3%
4 778318
 
6.3%
d 778169
 
6.3%
1 778128
 
6.3%
5 778046
 
6.3%
8 778030
 
6.3%
a 778030
 
6.3%
Other values (6) 4664064
37.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7781134
62.5%
Lowercase Letter 4666962
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 779268
10.0%
7 778763
10.0%
3 778533
10.0%
4 778318
10.0%
1 778128
10.0%
5 778046
10.0%
8 778030
10.0%
6 777811
10.0%
0 777165
10.0%
2 777072
10.0%
Lowercase Letter
ValueCountFrequency (%)
c 778747
16.7%
d 778169
16.7%
a 778030
16.7%
e 778017
16.7%
f 777705
16.7%
b 776294
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common 7781134
62.5%
Latin 4666962
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
9 779268
10.0%
7 778763
10.0%
3 778533
10.0%
4 778318
10.0%
1 778128
10.0%
5 778046
10.0%
8 778030
10.0%
6 777811
10.0%
0 777165
10.0%
2 777072
10.0%
Latin
ValueCountFrequency (%)
c 778747
16.7%
d 778169
16.7%
a 778030
16.7%
e 778017
16.7%
f 777705
16.7%
b 776294
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12448096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 779268
 
6.3%
7 778763
 
6.3%
c 778747
 
6.3%
3 778533
 
6.3%
4 778318
 
6.3%
d 778169
 
6.3%
1 778128
 
6.3%
5 778046
 
6.3%
8 778030
 
6.3%
a 778030
 
6.3%
Other values (6) 4664064
37.5%

unix_time
Real number (ℝ)

Distinct387001
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3492679 × 109
Minimum1.3253761 × 109
Maximum1.3718168 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:28.796409image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1.3253761 × 109
5-th percentile1.3286996 × 109
Q11.3387632 × 109
median1.3492668 × 109
Q31.3594863 × 109
95-th percentile1.3698414 × 109
Maximum1.3718168 × 109
Range46440765
Interquartile range (IQR)20723125

Descriptive statistics

Standard deviation12844910
Coefficient of variation (CV)0.0095199102
Kurtosis-1.0883381
Mean1.3492679 × 109
Median Absolute Deviation (MAD)10386412
Skewness0.0022487089
Sum5.2486927 × 1014
Variance1.649917 × 1014
MonotonicityNot monotonic
2023-05-17T00:08:29.022236image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1342848987 3
 
< 0.1%
1348375471 3
 
< 0.1%
1339879107 3
 
< 0.1%
1347630495 3
 
< 0.1%
1344278185 3
 
< 0.1%
1344083264 3
 
< 0.1%
1331300462 2
 
< 0.1%
1349529879 2
 
< 0.1%
1337965573 2
 
< 0.1%
1333834607 2
 
< 0.1%
Other values (386991) 388977
> 99.9%
ValueCountFrequency (%)
1325376051 1
< 0.1%
1325376282 1
< 0.1%
1325376308 1
< 0.1%
1325376383 1
< 0.1%
1325376416 1
< 0.1%
1325376543 1
< 0.1%
1325376788 1
< 0.1%
1325376877 1
< 0.1%
1325377060 1
< 0.1%
1325377356 1
< 0.1%
ValueCountFrequency (%)
1371816816 1
< 0.1%
1371816683 1
< 0.1%
1371816512 1
< 0.1%
1371816488 1
< 0.1%
1371816474 1
< 0.1%
1371816383 1
< 0.1%
1371816372 1
< 0.1%
1371816179 1
< 0.1%
1371815931 1
< 0.1%
1371815849 1
< 0.1%

merch_lat
Real number (ℝ)

Distinct384428
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.546535
Minimum19.027785
Maximum67.510267
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.0 MiB
2023-05-17T00:08:29.242113image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum19.027785
5-th percentile29.73821
Q134.733977
median39.383177
Q341.967087
95-th percentile46.032372
Maximum67.510267
Range48.482482
Interquartile range (IQR)7.23311

Descriptive statistics

Standard deviation5.1194179
Coefficient of variation (CV)0.13281136
Kurtosis0.80996415
Mean38.546535
Median Absolute Deviation (MAD)3.393313
Skewness-0.18271084
Sum14994718
Variance26.20844
MonotonicityNot monotonic
2023-05-17T00:08:29.444849image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.674306 3
 
< 0.1%
43.335386 3
 
< 0.1%
35.465297 3
 
< 0.1%
38.745859 3
 
< 0.1%
41.270604 3
 
< 0.1%
40.822878 3
 
< 0.1%
39.467059 3
 
< 0.1%
41.271468 3
 
< 0.1%
39.866419 3
 
< 0.1%
40.534977 3
 
< 0.1%
Other values (384418) 388973
> 99.9%
ValueCountFrequency (%)
19.027785 1
< 0.1%
19.033288 1
< 0.1%
19.034282 1
< 0.1%
19.034687 1
< 0.1%
19.036312 1
< 0.1%
19.03922 1
< 0.1%
19.04188 1
< 0.1%
19.048001 1
< 0.1%
19.052896 1
< 0.1%
19.054697 1
< 0.1%
ValueCountFrequency (%)
67.510267 1
< 0.1%
67.397018 1
< 0.1%
67.064277 1
< 0.1%
66.835174 1
< 0.1%
66.65822 1
< 0.1%
66.653465 1
< 0.1%
66.645176 1
< 0.1%
66.624674 1
< 0.1%
66.609969 1
< 0.1%
66.598747 1
< 0.1%

merch_long
Real number (ℝ)

Distinct387072
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-90.21776
Minimum-166.67013
Maximum-66.955996
Zeros0
Zeros (%)0.0%
Negative389003
Negative (%)100.0%
Memory size14.0 MiB
2023-05-17T00:08:29.649191image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum-166.67013
5-th percentile-119.30529
Q1-96.896077
median-87.443539
Q3-80.216844
95-th percentile-73.337194
Maximum-66.955996
Range99.714136
Interquartile range (IQR)16.679233

Descriptive statistics

Standard deviation13.779253
Coefficient of variation (CV)-0.15273326
Kurtosis1.8717293
Mean-90.21776
Median Absolute Deviation (MAD)8.243111
Skewness-1.1508036
Sum-35094979
Variance189.86781
MonotonicityNot monotonic
2023-05-17T00:08:29.864532image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-95.815432 3
 
< 0.1%
-92.175166 3
 
< 0.1%
-72.721662 3
 
< 0.1%
-82.00945 3
 
< 0.1%
-81.440297 3
 
< 0.1%
-83.06618 3
 
< 0.1%
-89.564406 3
 
< 0.1%
-87.247803 2
 
< 0.1%
-90.10778 2
 
< 0.1%
-97.653063 2
 
< 0.1%
Other values (387062) 388976
> 99.9%
ValueCountFrequency (%)
-166.670132 1
< 0.1%
-166.669638 1
< 0.1%
-166.659277 1
< 0.1%
-166.657174 1
< 0.1%
-166.656219 1
< 0.1%
-166.649771 1
< 0.1%
-166.64352 1
< 0.1%
-166.642151 1
< 0.1%
-166.63998 1
< 0.1%
-166.629875 1
< 0.1%
ValueCountFrequency (%)
-66.955996 1
< 0.1%
-66.958751 1
< 0.1%
-66.961923 1
< 0.1%
-66.963975 1
< 0.1%
-66.967742 1
< 0.1%
-66.970769 1
< 0.1%
-66.979887 1
< 0.1%
-66.983261 1
< 0.1%
-66.983329 1
< 0.1%
-66.984433 1
< 0.1%

is_fraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.0 MiB
0
386751 
1
 
2252

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters389003
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 386751
99.4%
1 2252
 
0.6%

Length

2023-05-17T00:08:30.052064image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-05-17T00:08:30.221764image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0 386751
99.4%
1 2252
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0 386751
99.4%
1 2252
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 389003
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 386751
99.4%
1 2252
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common 389003
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 386751
99.4%
1 2252
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 389003
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 386751
99.4%
1 2252
 
0.6%

Interactions

2023-05-17T00:08:14.984269image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:48.154449image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:54.114162image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:56.827165image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:01.468444image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:03.898936image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:07.816643image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:10.310095image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:12.693787image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:15.183789image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:48.634875image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:54.327404image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:57.336176image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:01.674357image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:04.158768image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:08.093744image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:10.514434image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:12.889664image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:15.651609image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:49.427262image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:54.816789image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:57.974284image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:02.171893image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:05.131602image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:08.592176image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:10.999635image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:13.378703image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:15.858022image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:49.994925image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:55.031779image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:58.442873image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:02.373885image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:05.404599image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:08.800014image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:11.203784image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:13.570333image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:16.051987image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:50.692107image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:55.224334image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:58.961532image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:02.563168image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:05.736537image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:08.988316image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:11.411528image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:13.756654image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:16.540455image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:51.380079image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:55.412336image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:59.471612image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:02.767228image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:06.077648image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:09.202981image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:11.605542image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:13.944024image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:16.742556image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:52.112736image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:55.639961image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:00.002200image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:02.979112image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:06.384973image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:09.413106image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:11.833405image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:14.142505image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:16.958632image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:52.856471image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:55.851520image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:00.494143image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:03.182254image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:06.690517image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:09.620971image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:12.025033image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:14.347145image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:17.150610image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:53.620845image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:07:56.070625image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:00.996430image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:03.375201image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:07.070034image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:09.826826image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:12.218061image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-05-17T00:08:14.531784image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-05-17T00:08:30.356086image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
cc_numamtziplatlongcity_popunix_timemerch_latmerch_longcategorygenderstateis_fraud
cc_num1.000-0.0020.017-0.007-0.0170.0500.003-0.007-0.0160.0650.9990.9990.315
amt-0.0021.0000.0010.0130.000-0.0250.0010.0130.0000.0210.0000.0000.000
zip0.0170.0011.000-0.162-0.959-0.0420.001-0.161-0.9570.0650.9900.9990.311
lat-0.0070.013-0.1621.0000.105-0.2660.0030.9910.1030.0110.1010.7990.011
long-0.0170.000-0.9590.1051.0000.089-0.0030.1050.9980.0080.0900.9220.005
city_pop0.050-0.025-0.042-0.2660.0891.000-0.003-0.2650.0880.0130.0900.3140.006
unix_time0.0030.0010.0010.003-0.003-0.0031.0000.003-0.0030.0000.0000.0030.017
merch_lat-0.0070.013-0.1610.9910.105-0.2650.0031.0000.1030.0110.1030.8130.010
merch_long-0.0160.000-0.9570.1030.9980.088-0.0030.1031.0000.0090.0810.8850.005
category0.0650.0210.0650.0110.0080.0130.0000.0110.0091.0000.0520.0190.069
gender0.9990.0000.9900.1010.0900.0900.0000.1030.0810.0521.0000.2570.005
state0.9990.0000.9990.7990.9220.3140.0030.8130.8850.0190.2571.0000.014
is_fraud0.3150.0000.3110.0110.0050.0060.0170.0100.0050.0690.0050.0141.000
2023-05-17T00:08:30.588486image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
amtlatlongcity_popunix_timemerch_latmerch_longis_fraud
amt1.000-0.001-0.0000.0050.001-0.001-0.0000.207
lat-0.0011.000-0.016-0.1550.0030.994-0.0160.002
long-0.000-0.0161.000-0.052-0.003-0.0150.9990.003
city_pop0.005-0.155-0.0521.000-0.001-0.154-0.0520.002
unix_time0.0010.003-0.003-0.0011.0000.003-0.003-0.005
merch_lat-0.0010.994-0.015-0.1540.0031.000-0.0150.002
merch_long-0.000-0.0160.999-0.052-0.003-0.0151.0000.003
is_fraud0.2070.0020.0030.002-0.0050.0020.0031.000
2023-05-17T00:08:30.809490image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
amtlatlongcity_popunix_timemerch_latmerch_longis_fraud
amt1.0000.0130.000-0.0250.0010.0130.0000.087
lat0.0131.0000.105-0.2660.0030.9910.1030.002
long0.0000.1051.0000.089-0.0030.1050.9980.004
city_pop-0.025-0.2660.0891.000-0.003-0.2650.0880.003
unix_time0.0010.003-0.003-0.0031.0000.003-0.003-0.004
merch_lat0.0130.9910.105-0.2650.0031.0000.1030.001
merch_long0.0000.1030.9980.088-0.0030.1031.0000.004
is_fraud0.0870.0020.0040.003-0.0040.0010.0041.000
2023-05-17T00:08:31.013151image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
amtlatlongcity_popunix_timemerch_latmerch_longis_fraud
amt1.0000.0090.000-0.0170.0010.0090.0000.071
lat0.0091.0000.085-0.1780.0020.9200.0830.001
long0.0000.0851.0000.062-0.0020.0840.9660.004
city_pop-0.017-0.1780.0621.000-0.002-0.1770.0610.002
unix_time0.0010.002-0.002-0.0021.0000.002-0.002-0.003
merch_lat0.0090.9200.084-0.1770.0021.0000.0820.001
merch_long0.0000.0830.9660.061-0.0020.0821.0000.003
is_fraud0.0710.0010.0040.002-0.0030.0010.0031.000
2023-05-17T00:08:31.232823image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
cc_numcategoryamtgenderstateziplatlongcity_popunix_timemerch_latmerch_longis_fraud
cc_num1.0000.0160.0000.0310.4390.1450.2570.1900.0820.0000.1880.1750.002
category0.0161.0000.0470.0670.0670.0270.0240.0190.0300.0000.0240.0190.088
amt0.0000.0471.0000.0000.0000.0000.0070.0000.0000.0000.0000.0000.000
gender0.0310.0670.0001.0000.3230.1590.1350.1090.1200.0000.1370.1080.007
state0.4390.0670.0000.3231.0001.0000.9650.9900.6570.0100.9690.9850.018
zip0.1450.0270.0000.1591.0001.0000.6790.8530.2710.0030.6740.8450.001
lat0.2570.0240.0070.1350.9650.6791.0000.8840.2960.0070.9920.8650.015
long0.1900.0190.0000.1090.9900.8530.8841.0000.3170.0020.9120.9960.008
city_pop0.0820.0300.0000.1200.6570.2710.2960.3171.0000.0070.2700.3320.008
unix_time0.0000.0000.0000.0000.0100.0030.0070.0020.0071.0000.0070.0040.022
merch_lat0.1880.0240.0000.1370.9690.6740.9920.9120.2700.0071.0000.8950.014
merch_long0.1750.0190.0000.1080.9850.8450.8650.9960.3320.0040.8951.0000.006
is_fraud0.0020.0880.0000.0070.0180.0010.0150.0080.0080.0220.0140.0061.000
2023-05-17T00:08:31.439726image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
stateis_fraudgendercategory
state1.0000.0140.2570.019
is_fraud0.0141.0000.0050.069
gender0.2570.0051.0000.052
category0.0190.0690.0521.000

Missing values

2023-05-17T00:08:18.200090image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-05-17T00:08:20.505361image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraud
4681952019-07-25 16:20:354587657402165341815fraud_Hills-Wittingshopping_net265.89AmberLewisF6296 John Keys Suite 858Pembroke TownshipIL6095841.0646-87.59172135Psychotherapist, child2004-05-08d0e8265dc7e7b979c1533abebe95402c134323323540.991185-88.5385860
6940392019-10-23 04:47:17347612609554823fraud_Kling Incgas_transport68.21RobertJamesM18316 Cannon PlaceNewportME495344.8393-69.26753228Lexicographer1995-12-2871a947bf4d90e76e4d5b9a5f1b1ec8b4135096763744.680256-69.3905100
10924242020-03-30 09:50:386011652924285713fraud_DuBuque LLCgrocery_pos95.39KathrynSmithF19838 Tonya Prairie Apt. 947Rocky MountMO6507238.2911-92.70591847Tax inspector1988-10-26dfda32052a68f1452b1190b182672cb0136463703838.954895-91.9277640
476792019-01-28 22:32:554839615922685395fraud_Grimes LLCentertainment21.39PhillipRobertsonM85344 Smith Gateway Apt. 280HarrodsburgIN4743439.0130-86.545776Social researcher1955-05-068858dbc699716a100343e8402c3d2d17132778997539.440657-85.8299470
2265022019-04-24 15:36:304989847570577635369fraud_Ullrich Ltdkids_pets36.53VanessaAndersonF21178 Brittney LocksProsperitySC2912734.1832-81.53248333Archaeologist1994-07-09b73f49b5c7081b864f1ba77678d86fce133528179034.267058-80.6151140
6100172019-09-16 03:13:484939976756738216fraud_Rodriguez, Yost and Jenkinsmisc_net259.30MichelleJohnstonF3531 Hamilton HighwayRomaTX7858426.4215-99.002518128IT trainer1990-11-07f0dccb373dbb52f87993b587cdac5338134776522827.403295-99.4846870
11611972020-04-28 23:06:174334230547694630fraud_Padberg-Sauerhome25.58ScottMartinM7483 Navarro FlatsFreedomWY8312043.0172-111.0292471Education officer, museum1967-08-024f7096fc7703ec9af052a743a2f6e2f4136719037743.862172-110.3639550
1749362019-04-01 14:28:5230074693890476fraud_Stiedemann Ltdfood_dining64.68KelseyRichardsF889 Sarah Station Suite 624HolcombKS6785137.9931-100.98932691Arboriculturist1993-08-16570a1cba31bb2b0f2421740a8b599f6b133329053238.967083-100.1743420
10548382020-03-13 23:32:59630423337322fraud_Jacobi and Sonsshopping_pos7.20StephanieGillF43039 Riley Greens Suite 393OrientWA9916048.8878-118.2105149Special educational needs teacher1978-06-2109e9b830ca58e5c71a23748a3d05f869136321757948.743502-118.9170420
2554802019-05-06 23:45:2730143535920989fraud_Kuhn Groupfood_dining12.49LisaCollinsF44197 Jeffrey Port Suite 050BridgeportNJ801439.8016-75.3478504Engineer, control and instrumentation1980-08-1712cfa522315fd07b4fcf44dd752203f4133634792739.041952-76.2241400
trans_date_trans_timecc_nummerchantcategoryamtfirstlastgenderstreetcitystateziplatlongcity_popjobdobtrans_numunix_timemerch_latmerch_longis_fraud
5580282019-08-25 23:40:514128027264554082fraud_Schmitt Ltdmisc_net851.68KyleParkM7507 Larry Passage Suite 859Mount PerryOH4376039.8788-82.18801831Barrister's clerk1953-10-181737ee3f0ef76a932163fb04546137a2134593805140.151813-81.6644761
10102612020-02-20 05:52:452233882705243596fraud_Bauch-Raynorgrocery_pos344.93JamieRobinsonF67089 Caitlin Meadow Apt. 905SturgisMS3976933.3570-89.04731923Medical physicist1960-01-16dac481866f20ffe19b1725cd2275c7be136133956533.339171-88.6134801
9288892020-01-03 23:52:394756039869079882102fraud_Block-Parisianmisc_net757.90FranciscoHernandezM980 Smith GardensGainesvilleTX7624033.6547-97.158326120Engineer, manufacturing1954-01-06043e396418a61a663559a4f65e99de74135725715933.937975-96.3825091
11015572020-04-03 10:59:44345832460465610fraud_Robel, Cummerata and Prosaccogas_transport9.11JasonMcmahonM6385 Donald Square Suite 429SpringfieldVA2215138.8029-77.2116104396Production engineer1950-11-20b991672c504a79471a46d05f02ca4109136498678438.454841-78.1578581
5705342019-08-30 23:32:5730596478689301fraud_Kassulke PLCshopping_net967.49DanielGrahamM28223 Ward Summit Apt. 664ArvadaCO8000539.8422-105.1097122111Hotel manager1987-05-239458505d34c58daa2893a825ce9aec3c134636957739.262510-105.6561751
5678792019-08-29 22:07:203526826139003047fraud_Block-Parisianmisc_net853.27NathanMasseyM5783 Evan Roads Apt. 465FalmouthMI4963244.2529-85.01701126Furniture designer1955-07-0691b28f3102c9055893488d23afde134e134627804043.357421-85.0894031
10009832020-02-15 02:30:1230235438713303fraud_Durgan-Auermisc_net750.98JamesBaldwinM3603 Mitchell CourtWinfieldWV2521338.5072-81.89005512Exhibition designer1980-03-24e21fba3f053af7ea0bd31dfed1ddf2e4136089541237.557927-81.1700281
5884972019-09-07 04:27:01377264520876399fraud_Schoen, Kuphal and Nitzschegrocery_pos309.71KaraMilesF2076 Thomas Roads Suite 970CassattSC2903234.3424-80.50004424Lawyer1961-07-310cdec150cbb24a0e689a11001e9f0b15134699202134.178315-79.7589651
12334662020-05-30 02:45:146011681934117244fraud_Koepp-Parkergrocery_pos328.28KaitlynNewmanF098 Stewart HillSlaydenTN3716536.2835-87.458170Prison officer1956-06-22c8a8ae7f2d176c8235b20babb0ee8b80136988191436.153313-86.6463371
10468462020-03-10 01:28:303589289942931264fraud_Marks Incgas_transport7.61PaulaEstradaF350 Stacy GlensSpencerSD5737443.7557-97.5936343Development worker, international aid1972-03-052cbedfecdb3594a19c965e57b1b885c0136287891043.356988-96.7708971